{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Heatmaps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The goal of this notebook is to mention the importance of choosing an appropriate color palette. A heat map is a graphical representation of data where values are depicted by colors. Heatmaps allow you to easier spot where something happened and where it didn't. Consequently, what we choose for our color palette is important. Two types of color palettes are: \n",
    "\n",
    "1. Sequential: appropriate when data ranges from relatively low values to relatively high values. \n",
    "2. Qualitative: best when you want to distinguish discrete chunks of data that <b>do not have inherent ordering</b>.\n",
    "\n",
    "![](images/heatmapColorPalette.png)\n",
    "\n",
    "The data we will use is for a confusion matrix which is a table that is often used to describe the performance of a machine learning classification model. It can be used to tell you where the predictions went wrong. In the case of the images above, it is derived from predicting labels for digits from 0-9."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The ``inline`` flag will use the appropriate backend to make figures appear inline in the notebook.  \n",
    "%matplotlib inline\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "# `plt` is an alias for the `matplotlib.pyplot` module\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# import seaborn library (wrapper of matp__lotlib)\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Data\n",
    "\n",
    "The data is a confusion matrix which is a table that is often used to describe the performance of a machine learning classification model. It tells you where the predictions went wrong. \n",
    "\n",
    "This particular table is derived from predicting labels for digits from 0-9."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "confusion = np.array([[37, 0,  0,  0,  0,  0,  0,  0,  0,  0],\n",
    "                      [0, 39,  0,  0,  0,  0,  1,  0,  2,  1],\n",
    "                      [0,  0, 41,  3,  0,  0,  0,  0,  0,  0],\n",
    "                      [0,  0,  0, 44,  0,  0,  0,  0,  1,  0],\n",
    "                      [0,  0,  0,  0, 37,  0,  0,  1,  0,  0],\n",
    "                      [0,  0,  0,  0,  0, 46,  0,  0,  0,  2],\n",
    "                      [0,  1,  0,  0,  0,  0, 51,  0,  0,  0],\n",
    "                      [0,  0,  0,  1,  1,  0,  0, 46,  0,  0],\n",
    "                      [0,  3,  1,  0,  0,  0,  0,  0, 44,  0],\n",
    "                      [0,  0,  0,  0,  0,  1,  0,  0,  2, 44]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Matplotlib\n",
    "Unfortunately using Matplotlib involves quite a of code for heatmaps. It is worth mentioning that Matplotlib definitely has flaws. \n",
    "\n",
    "1. Matplotlib defaults are not ideal (no grid lines, white background etc).\n",
    "2. The library is relatively low level. Doing anything complicated takes quite a bit of code. \n",
    "3. Not perfect integration with pandas data structures (though this is being improved)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this is a lot of code that is not trivial to create\n",
    "plt.figure(figsize=(6,6))\n",
    "plt.imshow(confusion, interpolation='nearest', cmap='Blues')\n",
    "plt.colorbar()\n",
    "tick_marks = np.arange(10)\n",
    "plt.xticks(tick_marks, [\"0\", \"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\"], size = 10)\n",
    "plt.yticks(tick_marks, [\"0\", \"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\"], size = 10)\n",
    "plt.tight_layout()\n",
    "plt.ylabel('Actual label', size = 15)\n",
    "plt.xlabel('Predicted label', size = 15)\n",
    "width, height = confusion.shape\n",
    "\n",
    "for x in range(width):\n",
    "    for y in range(height):\n",
    "        plt.annotate(str(confusion[x][y]), xy=(y, x), \n",
    "                    horizontalalignment='center',\n",
    "                    verticalalignment='center')\n",
    "        \n",
    "## Using Knowledge Learned Online. Comment the 4 lines below out if you have issues. \n",
    "b, t = plt.ylim() # discover the values for bottom and top\n",
    "b += 0.5 # Add 0.5 to the bottom\n",
    "t -= 0.5 # Subtract 0.5 from the top\n",
    "plt.ylim(b, t) # update the ylim(bottom, top) values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Seaborn\n",
    "\n",
    "Wrapper of matplotlib. One reason why you might want to plot using Seaborn is that it requires less syntax. Keep in mind that sometimes you will find it useful to use Matplotlib syntax to adjust the final plot output. In the case below, the Matplotlib syntax adds xlabels and ylabels. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Seaborn with Sequential Colormap"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sequential Sequential: appropriate when data ranges from relatively low\n",
    "# (uninteresting values) to relatively high (interesting values). \n",
    "plt.figure(figsize=(6,6))\n",
    "sns.heatmap(confusion, \n",
    "            annot=True,\n",
    "            cmap = 'Blues');\n",
    "plt.ylabel('Actual label');\n",
    "plt.xlabel('Predicted label');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Debugging Top and Bottom Cut Off\n",
    "\n",
    "For this particular graph and <b><u>version</b></u> of matplotlib/seaborn, notice how the top and bottom part of the graph is cutoff. By the time you take this class, it might not be a problem as these open source libraries are constantly being updated. I still enourage you to watch the video and see how to solve a problem so when one happens in the future whether it be with visualization or machine learning, you can better solve it. \n",
    "\n",
    "Google: seaborn heatmap top and bottom cut off\n",
    "https://www.google.com/search?q=seaborn+heatmap+top+and+bottom+cut+off&oq=seaborn+heatmap+top+cut+&aqs=chrome.1.69i57j0.6781j0j7&sourceid=chrome&ie=UTF-8\n",
    "\n",
    "MATLAB-style Solution: https://github.com/mwaskom/seaborn/issues/1773\n",
    "\n",
    "Object Oriented Solution: https://stackoverflow.com/questions/56942670/matplotlib-seaborn-first-and-last-row-cut-in-half-of-heatmap-plot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<b>MATLAB-style Fix </b>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sequential Sequential: appropriate when data ranges from relatively low\n",
    "# (uninteresting values) to relatively high (interesting values). \n",
    "plt.figure(figsize=(6,6))\n",
    "sns.heatmap(confusion, \n",
    "            annot=True,\n",
    "            cmap = 'Blues');\n",
    "plt.ylabel('Actual label');\n",
    "plt.xlabel('Predicted label');\n",
    "\n",
    "## Using Knowledge Learned Online\n",
    "b, t = plt.ylim() # discover the values for bottom and top\n",
    "b += 0.5 # Add 0.5 to the bottom\n",
    "t -= 0.5 # Subtract 0.5 from the top\n",
    "plt.ylim(b, t) # update the ylim(bottom, top) values\n",
    "plt.savefig('images/sequentialHeatmap.png', dpi = 300)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<b>Object-oriented</b>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# sequential Sequential: appropriate when data ranges from relatively low\n",
    "# (uninteresting values) to relatively high (interesting values). \n",
    "plt.figure(figsize=(6,6))\n",
    "axes = sns.heatmap(confusion, \n",
    "            annot=True,\n",
    "            cmap = 'Blues');\n",
    "plt.ylabel('Actual label');\n",
    "plt.xlabel('Predicted label');\n",
    "\n",
    "# Fix to make sure the top isn't cut off \n",
    "bottom, top = axes.get_ylim()\n",
    "axes.set_ylim(bottom + 0.5, top - 0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Seaborn with Qualitative Colormap\n",
    "Qualitative colormaps are best when you want to distinguish discrete chunks of data that <b>do not have inherent ordering</b>. This may not be the best choice for this data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(6,6))\n",
    "sns.heatmap(confusion, \n",
    "            annot=True,\n",
    "            cmap = 'Pastel1');\n",
    "plt.ylabel('Actual label');\n",
    "plt.xlabel('Predicted label');\n",
    "\n",
    "## Using Knowledge Learned Online\n",
    "b, t = plt.ylim() # discover the values for bottom and top\n",
    "b += 0.5 # Add 0.5 to the bottom\n",
    "t -= 0.5 # Subtract 0.5 from the top\n",
    "plt.ylim(b, t) # update the ylim(bottom, top) values\n",
    "\n",
    "plt.savefig('images/qualitativeHeatmap.png', dpi = 300)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
